Balancing a Pipeline

نویسنده

  • PER BRINCH HANSEN
چکیده

Reduction of a matrix to triangular form plays a crucial role in the solution of linear equations. In this chapter, I analyze a pipeline algorithm for Householder reduction (Brinch Hansen 1990). The pipeline is folded several times across an array of processors to achieve approximate load balancing. The pipeline inputs, transforms, and outputs a matrix, column by column. During the computation, the columns are distributed evenly among the processors. The computing time per column decreases rapidly from the first to the last column. So, the performance of the algorithm is limited mainly by the order in which the columns are distributed among the processors. The simplest idea is to store a block of columns with consecutive indices in each processor (Ortega 1988). Block storage performs poorly because it assigns the most time-consuming columns to a single processor and leaves much less work for other processors. It is much better to distribute the columns cyclically among the processors, so that each processor holds a similar mixture of columns. This storage pattern is called wrapped mapping or scattered decomposition (Ortega 1988, Fox 1988). A third method is reflection storage where the columns are distributed one at a time by going back and forth across the processors several times (Ortega 1988). The folded pipeline combines block and reflection storage. On a Computing Surface with 25 transputers, the Householder pipeline achieves an efficiency of 81% for a 1250×1250 real matrix.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Synchronization-Free Parallel Collision Detection Pipeline

We present a first parallel and adaptive collision detection pipeline running on a multi-core architecture. This pipeline integrates a first global synchronization-free parallelization of its major steps and enables to dynamically adapt the parallelism repartition during the simulation. We propose to break the sequentiality of the pipeline by simultaneously executing the two main phases (broad ...

متن کامل

Towards a Dynamic Parallel Database Machine: Data Balancing Techniques and Pipeline Ecole Normale Supérieure De Lyon towards a Dynamic Parallel Database Machine: Data Balancing Techniques and Pipeline towards a Dynamic Parallel Database Machine: Data Balancing Techniques and Pipeline

The fast development over the last years of high performance multicomputers makes them attractive candidates as the base technology for scalable and performance oriented database applications In this paper we address the problem of how to process util ity commands while the system remains operational and the data remain available for concurrent access In particular we focus on the on line reorg...

متن کامل

Dynamic versus Static Load Balancing in a Pipeline Computation

We examine load balancing in a simple pipeline computation, in which a large number of data sets is pipelined through a series of tasks and load balancing is performed by distributing several available processors among the tasks. We compare the performance of the optimal static processor assignment to the performances of three dynamic processor assignment algorithms. Models are derived which al...

متن کامل

Tools for Mapping, Load Balancing and Monitoring in the LOGFLOW Parallel Prolog Project

LOGFLOW is an all-solution parallel logic programming system able to exploit OR-parallelism and pipeline AND-parallelism of Prolog programs. The LOGFLOW project is intended to implement Prolog in massively parallel distributed memory multicomputers. Porting LOGFLOW to a workstation cluster resulted in a variant of LOGFLOW called WS-LOGFLOW. Implementation of LOGFLOW both on multi-transputers an...

متن کامل

A Proposal for a Sort-Middle Cluster Rendering System

Cluster rendering systems often take a sort-first, sort-last, or a hybrid of these two approaches because it is generally assumed that the hardware pipeline cannot be split between the geometric and rasterization stages. These approaches often limit the scalability of the system because they introduce either load balancing problems or high contention for the communication network. We propose a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005